Predicting Lexical Norms Using a Word Association Corpus

نویسندگان

  • Hendrik Vankrunkelsven
  • Steven Verheyen
  • Simon De Deyne
  • Gerrit Storms
چکیده

Obtaining norm scores for subjective properties of words can be quite cumbersome as it requires a considerable investment proportional to the size of the word set. We present a method to predict norm scores for large word sets from a word association corpus. We use similarities between word pairs, derived from this corpus, to construct a semantic space. Starting from norm scores for a subset of the words, we retrieve the direction in the space that optimally reflects the norm data associated with the words. This direction is used to orthogonally project all the other words in the semantic space on, providing predictions of the words on the variable of interest. In this study, we predict valence, arousal, dominance, age of acquisition, and concreteness and show that the predictions correlate strongly with the judgments of human raters. Furthermore, we show that our predictions are superior to those derived using other methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words

Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics. From a psycholinguistic perspective, however, the corpora used in these contributions are often not representative of language usage: they are either domain-specific, limited in size, or extracted from unreliable sources. In an effort to address...

متن کامل

Coling 2008 22 nd International Conference on Computational Linguistics

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Using Web Corpora for the Automatic Acquisition of Lexical-Semantic Knowledge

This article presents two case studies to explore whether and how web corpora can be used to automatically acquire lexical-semantic knowledge from distributional information. For this purpose, we compare three German web corpora and a traditional newspaper corpus on modelling two types of semantic relatedness: (1) Assuming that free word associations are semantically related to their stimuli, w...

متن کامل

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015